Latent Variable Models
نویسنده
چکیده
A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be expressed in terms of more tractable joint distributions over the expanded variable space. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data. 371 372 CHRISTOPHER M. BISHOP 1. Density Modelling One of the central problems in pattern recognition and machine learning is that of density estimation, in other words the construction of a model of a probability distribution given a finite sample of data drawn from that distribution. Throughout this chapter we will consider the problem of modelling the distribution of a set of continuous variables t1, . . . , td which we will collectively denote by the vector t. A standard approach to the problem of density estimation involves parametric models in which a specific form for the density is proposed which contains a number of adaptive parameters. Values for these parameters are then determined from an observed data set D = {t1, . . . , tN} consisting of N data vectors. The most widely used parametric model is the normal, or Gaussian, distribution given by p(t|μ,Σ) = (2π)|Σ| exp { − 1 2 (t − μ)Σ(t − μ) } (1) where μ is the mean, Σ is the covariance matrix, and |Σ| denotes the determinant of Σ. One technique for setting the values of these parameters is that of maximum likelihood which involves consideration of the log probability of the observed data set given the parameters, i.e. L(μ,Σ) = ln p(D|μ,Σ) = N ∑ n=1 ln p(tn|μ,Σ) (2) in which it is assumed that the data vectors tn are drawn independently from the distribution. When viewed as a function of μ and Σ, the quantity p(D|μ,Σ) is called the likelihood function. Maximization of the likelihood (or equivalently the log likelihood) with respect to μ and Σ leads to the set of parameter values which are most likely to have given rise to the observed data set. For the normal distribution (1) the log likelihood (2) can be maximized analytically, leading to the intuitive result [1] that the maximum likelihood solutions μ̂ and Σ̂ are given by μ̂ = 1 N N ∑ n=1 tn (3) Σ̂ = 1 N N ∑ n=1 (tn − μ̂)(tn − μ̂) T (4) corresponding to the sample mean and sample covariance respectively. As an alternative to maximum likelihood, we can define priors over μ and Σ use Bayes’ theorem, together with the observed data, to determine LATENT VARIABLE MODELS 373 the posterior distribution. An introduction to Bayesian inference for the normal distribution is given in [5]. While the simple normal distribution (1) is widely used, it suffers from some significant limitations. In particular, it can often prove to be too flexible in that the number of independent parameters in the model can be excessive. This problem is addressed through the introduction of continuous latent variables. On the other hand, the normal distribution can also be insufficiently flexible since it can only represent uni-modal distributions. A more general family of distributions can be obtained by considering mixtures of Gaussians, corresponding to the introduction of a discrete latent variable. We consider each of these approaches in turn. 1.1. LATENT VARIABLES Consider the number of free parameters in the normal distribution (1). Since Σ is symmetric, it contains d(d + 1)/2 independent parameters. There are a further d independent parameters in μ, making d(d + 3)/2 parameters in total. For large d this number grows like d2, and excessively large numbers of data points may be required to ensure that the maximum likelihood solution for Σ is well determined. One way to reduce the number of free parameters in the model is to consider a diagonal covariance matrix, which has just d free parameters. This, however, corresponds to a very strong assumption, namely that the components of t are statistically independent, and such a model is therefore unable to capture the correlations between different components. We now show how the number of degrees of freedom within the model can be controlled, while still allowing correlations to be captured, by introducing latent (or ‘hidden’) variables. The goal of a latent variable model is to express the distribution p(t) of the variables t1, . . . , td in terms of a smaller number of latent variables x = (x1, . . . , xq) where q < d. This is achieved by first decomposing the joint distribution p(t,x) into the product of the marginal distribution p(x) of the latent variables and the conditional distribution p(t|x) of the data variables given the latent variables. It is often convenient to assume that the conditional distribution factorizes over the data variables, so that the joint distribution becomes p(t,x) = p(x)p(t|x) = p(x) d ∏ i=1 p(ti|x). (5) This factorization property can be expressed graphically in terms of a Bayesian network, as shown in Figure 1. 374 CHRISTOPHER M. BISHOP
منابع مشابه
Using multivariate generalized linear latent variable models to measure the difference in event count for stranded marine animals
BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are g...
متن کاملLatent Variable Modelling: A Survey*
Latent variable modelling has gradually become an integral part of mainstream statistics and is currently used for a multitude of applications in different subject areas. Examples of ‘traditional’ latent variable models include latent class models, item–response models, common factor models, structural equation models, mixed or random effects models and covariate measurement error models. Altho...
متن کاملمدل معادلات ساختاری و کاربرد آن در مطالعات روانشناسی: یک مطالعه مروری
Introduction: Structural Equation Modeling (SEM) is a very general statistical modeling technique, which is widely used in the behavioral sciences. It can be viewed as a combination of path analysis, regression and factor analysis. One of the prominent features of this method is the ability to compute direct, indirect and total effects, as well as latent variable modeling. Methods: This sy...
متن کاملIncorporating Boosted Regression Trees into Ecological Latent Variable Models
Important ecological phenomena are often observed indirectly. Consequently, probabilistic latent variable models provide an important tool, because they can include explicit models of the ecological phenomenon of interest and the process by which it is observed. However, existing latent variable methods rely on handformulated parametric models, which are expensive to design and require extensiv...
متن کاملExtending Spectral Methods to New Latent Variable Models
Latent variable models are widely used in industry and research, though the problem of estimating their parameters has remained challenging; standard techniques (e.g., Expectation-Maximization) offer weak guarantees of optimality. There is a growing body of work reducing latent variable estimation problems to a certain(orthogonal) spectral decompositions of symmetric tensors derived from the mo...
متن کاملFunction Optimization with Latent Variable Models
Most of estimation of distribution algorithms (EDAs) try to represent explicitly the relationship between variables with factorization techniques or with graphical models such as Bayesian networks. In this paper, we propose to use latent variable models such as Helmholtz machine and probabilistic principal component analysis for capturing the probabilistic distribution of given data. The latent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999